83 results found.
Written
Lexicon,
Language Type:
Multilingual
Languages:
Abron Acehnese Afar Arabic Baharna Arabic Mesopotamian Arabic
Availability:
Freely Available
License:
CreativeCommons
Size:
50000 tokens Production Status:
Newly created-in progress
Use:
Text Mining
-
Paper title:Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Isaac Caswell | TF-IDF-IIF top100 wordlists | /N |
Documentation:
https://github.com/google-research-datasets/TF-IDF-IIF-top100-wordlists
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Chinese English
Availability:
From Owner
License:
LDC User Agreement for Non-Members
Size:
- MByte Production Status:
Existing-used
Use:
structural information
-
Paper title:CxGBERT: BERT meets Construction Grammar
-
Paper track:Long paper/
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Harish Tayyar Madabushi | OntoNotes Release 5.0 | /N |
Documentation:
https://catalog.ldc.upenn.edu/docs/LDC2013T19/
Written
Corpus,
Language Type:
Bilingual
Languages:
Arabic Egyptian Arabic
Availability:
From Data Center(s)
License:
LDC
Size:
118568 KByte Production Status:
Existing-used
Use:
Text Normalization
-
Paper title:Phonetic and Visual Priors for Decipherment of Informal Romanization
-
Paper track:Long/Phonology, Morphology and Word Segmentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Maria Ryskina | BOLT Egyptian Arabic SMS/Chat and Transliteration | /N |
Documentation:
None
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Arabic German Turkish
Availability:
Freely Available
License:
Apache License 2.0
Size:
814 sentences Production Status:
Newly created-finished
Use:
Document Classification, Text categorisation
-
Paper title:Multi-Label and Multilingual News Framing Analysis
-
Paper track:Long/NLP Applications
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Afra Feyza Akyürek | Multilingual Gun Violence Frame Corpus | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Chinese English German Hindi Spanish Vietnamese
Availability:
Freely Available
License:
Size:
50+ GByte Production Status:
Existing-used
Use:
Machine Learning
-
Paper title:MLQA: Evaluating Cross-lingual Extractive Question Answering
-
Paper track:Long/Question Answering
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Patrick Lewis | Wikipedia | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Arabic English French
Availability:
Freely Available
License:
Creative Commons Attribution-NonCommercial- NoDerivs 3.0 license
Size:
18.3 MByteProduction Status:
Existing-used
Use:
word classification
-
Paper title:Token-Level Supervised Contrastive Learning for Punctuation Restoration
-
Paper track:9.3 Language modelling/Oral Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Qiushi Huang | International Workshop on Spoken Language Translation | /N |
Documentation:
http://hltc.cs.ust.hk/iwslt/index.php/evaluation-campaign/ted-task.htmlLanguage Type:
Multilingual
Languages:
Arabic
Availability:
From Owner
License:
<Not Specified>
Size:
10 MByte Production Status:
Newly created-in progress
Use:
Emotion Recognition/Generation
-
Paper title:SentiArabic: A Sentiment Analyzer for Standard Arabic
-
Paper track:Multimodality
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Ramy Eskander | Ramitechs | US |
| Main Contact | Ramy Eskander | Columbia University | None |
Documentation:
In progressLanguage Type:
Multilingual
Languages:
Arabic Chinese English Finnish Hindi
Availability:
Freely Available
License:
BSD 3
Size:
<Not Specified> <Not Specified>Production Status:
Existing-updated
Use:
Corpus Creation/Annotation
-
Paper title:Gold Standard Annotations for Preposition and Verb Sense with Semantic Role Labels in Adult-Child Interactions
-
Paper track:Resource paper
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Lori Moon | University of Illinois at Urbana-Champaign | None |
| Author 2 | Christos Christodoulopoulos | Amazon | GB |
| Author 3 | Fisher Cynthia | University of Illinois | US |
| Author 4 | Sandra Franco | University of Illinois at Urbana-Champaign | N/A |
| Author 5 | Dan Roth | University of Illinois | US |
| Main Contact | Lori Moon | University of Illinois at Urbana-Champaign | None |
Documentation:
https://www.colorado.edu/ics/sites/default/files/attached-files/techreport02-09-jubilee.pdf
Written
Corpus,
Language Type:
Multilingual
Languages:
Ancient Greek Arabic Chinese English Finnish Hebrew Korean Russian Swedish
Availability:
Freely Available
License:
CreativeCommons, Gnu
Size:
11814230 tokens Production Status:
Existing-used
Use:
Parsing and Tagging
-
Paper title:The (Non-)Utility of Structural Features in BiLSTM-based Dependency Parsers
-
Paper track:Long/Tagging, Chunking, Syntax and Parsing
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Agnieszka Falenska | Universal Dependencies 2.0 | /N |
Documentation:
https://universaldependencies.org/v2/
Written
Corpus,
Language Type:
Multilingual
Languages:
Afrikaans Albanian Amharic Arabic Aragonese Armenian Assamese Azerbaijani Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Catalan Central Khmer Chinese Croatian Czech Danish Dutch Dzongkha English Esperanto Estonian Finnish French Gaelic Galician Georgian German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Kannada Kazakh Kinyarwanda Korean Kurdish Kyrgyz Latvian Limburgan Lithuanian Macedonian Malagasy Malay Malayalam Maltese Marathi Mongolian Nepali Northern Sami Norwegian Norwegian Bokmål Norwegian Nynorsk Occitan Oriya Panjabi Pashto Persian Polish Portuguese Romanian Russian Serbian Serbo-Croatian Sinhala Slovak Slovenian Spanish Swedish Tajik Tamil Tatar Telugu Thai Turkish Turkmen Uighur Ukrainian Urdu Uzbek Vietnamese Walloon Welsh Western Frisian Xhosa Yiddish Yoruba Zulu
Availability:
Freely Available
License:
Size:
55 million sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Biao Zhang | the open parallel corpus (OPUS) | /N |
Documentation:
None




